Label Filters for Large Scale Multilabel Classification

نویسندگان

  • Alexandru Niculescu-Mizil
  • Ehsan Abbasnejad
چکیده

When assigning labels to a test instance, most multilabel and multiclass classifiers systematically evaluate every single label to decide whether it is relevant or not. This linear scan over labels becomes prohibitive when the number of labels is very large. To alleviate this problem we propose a two step approach where computationally efficient label filters pre-select a small set of candidate labels before the base multiclass or multilabel classifier is applied. The label filters select candidate labels by projecting a test instance on a filtering line, and retaining only the labels that have training instances in the vicinity of this projection. The filter parameters are learned directly from data by solving a constraint optimization problem, and are independent of the base multilabel classifier. The proposed label filters can be used in conjunction with any multiclass or multilabel classifier that requires a linear scan over the labels, and speed up prediction by orders of magnitude without significant impact on performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Bloom Filters for Large MultiLabel Classification Tasks

This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive appli...

متن کامل

Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain

In this paper we applied multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. On this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independ...

متن کامل

Adaptive Large Margin Training for Multilabel Classification

Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without requiring exponential enumeration of label subsets during training or testing. We investigate no...

متن کامل

Comparing multilabel classification methods for provisional biopharmaceutics class prediction.

The biopharmaceutical classification system (BCS) is now well established and utilized for the development and biowaivers of immediate oral dosage forms. The prediction of BCS class can be carried out using multilabel classification. Unlike single label classification, multilabel classification methods predict more than one class label at the same time. This paper compares two multilabel method...

متن کامل

Graded Multilabel Classification: The Ordinal Case

We propose a generalization of multilabel classification that we refer to as graded multilabel classification. The key idea is that, instead of requesting a yes-no answer to the question of class membership or, say, relevance of a class label for an instance, we allow for a graded membership of an instance, measured on an ordinal scale of membership degrees. This extension is motivated by pract...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017